\cleardoublepage
\chapternonum{摘要}
 影响力传播源检测是社交网络影响力传播分析领域中的一类重要问题，旨在通过观察信息在社交网络用户间的传播情况，反向推导信息传播扩散的源头，在舆情预警、谣言控制、传染病溯源等应用与研究领域均有重要意义。
现有依赖先验传播模型的传播源检测方法在传播模型未知的隐式场景下不具有可解释性，且无法有效检测传播源，
而不依赖先验传播模型的传播源检测方法大多未能充分表征图节点的拓扑信息以及多次观测得到的快照序列的时序信息。
此外，大多数现有方法仅限于解决小规模社交网络图中的源点检测问题，难以扩展到大规模社交网络图任务。

为解决上述问题，本文做出了以下贡献：
\begin{enumerate}
    \item 提出了在小规模数据中运行、基于逆向序列预测模型进行源点检测的RSPSI（Reverse Sequence Prediction Source Identification）方法，它由多维度输入特征提取及图神经网络表征、基于MLP-Mixer模型的多维特征融合、基于Transformer模型的逆向序列预测等多个关键模块所组成。
    \item 在此基础上，本文提出了基于子图处理和权重判别模型的CRSPSI（Cluster-RSPSI）方法，它通过将大规模数据划分为若干子图并以子图为训练和预测任务计算单位来实现大规模数据的可扩展性，运用子图权重判别模型表征图数据中的社群特征并平衡独立子图任务的数值差异，实现了基于Offloading方法的GPU-CPU协作机制，进而有效地运用消费级设备的运算算力，提升了模型整体的训练和预测速度。
\end{enumerate}

从上述设计思路出发，本文分别构建了两个源点检测模型在训练集中有先验传播源点的监督学习场景（RSPSI-S和CRSPSI-S）和无先验传播源点的自监督学习场景（RSPSI-E和CRSPSI-E）下的具体模型架构。
最后，本文在真实公开数据集下与时下最优的基线模型进行对比实验，体现了模型的有效性和先进性，并通过验证模型关键设计有效性的消融实验、验证模型参数与实现形式选择的对比实验以及验证模型工程实现流程高效性的性能实验等工作分析，进一步验证模型的可解释性、可扩展性和高效性。

\textbf{关键词}：社交网络、传播源检测、图神经网络、时间序列预测
\cleardoublepage
\chapternonum{Abstract}
As an important field of influence diffusion analysis in social network, the influential source identification task aims at effectively locating the source nodes in the social network by observation of the influence diffusion process. This research has vary important practical significance in the application of public opinion perception, rumor control, epidemic tracking and so on.
On one hand, existing source identification methods based on prior propagation models lack interpretability and undergo a substantial decrease in performance in implicit scenarios where the propagation model is unknown. On the other hand, methods that do not rely on prior propagation models turn out unable to adequately capture the topological information of graph nodes and the temporal information embedded in multiple snapshot sequences. Furthermore, most existing methods can only work on small-scale social network datasets and lack scalability to handle tasks in large-scale social networks.



To address the aforementioned challenges, this thesis makes the following contributions:

\begin{enumerate}
    \item It proposes  the RSPSI (Reverse Sequence Prediction Source Identification) method, designed for source identification in small-scale data using reverse sequence prediction model. RSPSI comprises several key components, including multi-dimensional feature extraction and graph neural network representation module, multi-dimensional feature fusion module based on the MLP-Mixer model, and reverse sequence prediction module based on the Transformer model.
    \item Building on that foundation, this research further introduces the CRSPSI (Cluster-RSPSI) method, which enhances scalability for large-scale data by dividing large datasets into multiple subgraphs and treating each subgraph as an independent unit for training and prediction. The CRSPSI method incorporates a subgraph weight discrimination model to represent community features within the graph data and balance numerical differences among independent subgraph tasks. Additionally, it implements an offloading-based GPU-CPU collaborative mechanism, effectively utilizing consumer-grade hardware to enhance the overall training and prediction speed of the model.
\end{enumerate}

Based on these design principles, this thesis implements two sets of source identification models: RSPSI-S and CRSPSI-S for supervised learning scenarios with prior knowledge of propagation sources, and RSPSI-E and CRSPSI-E for self-supervised learning scenarios without prior knowledge of propagation sources.
Finally, this thesis validates the effectiveness and advancement of the proposed models through comparative experiments on real-world publicly available datasets against a series of state-of-the-art baseline models. Additionally, the interpretability, scalability, and efficiency of the models are analyzed and verified through ablation studies to assess the effectiveness of key design components, comparative experiments to evaluate vital parameter and implementation choices, and performance experiments to demonstrate the efficiency of the engineering implementation process.

\textbf{Keyword}：Social Network, Source Identification, Graph Neural Network, Time Series Prediction